mmg_233_2013_genetics_genomicswikiaorg-20200214-history
Evolution of the Influenza Virus Hemagglutinin Genes
Hemagglutinin (HA) is the major envelop glycoprotein of Influenza A and B viruses and the HA homolog hemagglutinin-esterase (HE) is the major glycoprotein of Influenza C. These glycoproteins are important for viral entry into host and pathogenicity. Throughout history, Influenza A pandemics are associated with changes to the HA glycoprotein. Background Influenza viruses have segmented, single-stranded, negative-sense RNA genomes that encode for envelope glycoproteins, matrix proteins, nonstructural proteins, nucleoproteins and polymerase proteins. Influenza viruses are classified based on the antigenic properties of their matrix proteins or nucleoproteins into influenza type A, B or C. Influenza A viruses are known to cause epidemics and pandemics in mammals, including humans, and birds, being the aquatic reservoir for the virus. Influenza B and C viruses are mainly found in humans and are less pathogenic than Influenza A. Hemagglutinin (HA) is the major envelope glycoprotein of A and B viruses whereas hemagglutinin-esterase (HE) is the homolog in C viruses. HA (or HE) is cleaved into protein HA1 (HE1) and HA2 (HE2) to form the mature protein ubunits. HA1 is a receptor-binding protein and the major target for an immune response in the host whereas HA2 is an anchor protein of the envelope which initiates fusion of the viral envelope with the host’s cell membrane. Each HA monomer consists of the HA2 helical anchor chain topped with the HA1 subunit globule (2). The HA1 subunit targets specific sugar chains on the cells of the host organism. Once bound, the HA2 subunit initiates the "attack" on the cell mediating viral fusion. HA is also responsible for the agglutination of red blood cells, as the name implies (3). Viral particles are covered with HA molecules that attach to many red blood cells, causing clumping and shielding the virus from antibody neutralization. Influenza A HA genes are further classified into 15 subtypes (H1-H15) according to their antigenic properties. Each HA subtype specializes in binding to a different cell type. For instance, H5 specializes in infecting the cells of the digestive system in birds without being particularly lethal. Therefore, this virus acts as an invisible reservoir in birds. Influenza A virus pandemics in humans appear when mutations occur in HA genes introducing new subtypes of the HA protein in aquatic bird viruses. A recent virus appearing in the news is the H5N1 aquatic bird influenza which is particularly lethal for birds, but not to humans (3). However, this virus could be lethal to humans if the hemagglutinin mutated to a subtype capable of infecting human cells. Therefore, studies of the rate of evolution of HA genes can affect how we prepare for future pandemics. Evolution of Hemagglutinin A few papers prior to Suzuki and Nei's research attempted to study the divergence rate of HA genes. However, there were a few issues with the research that ultimately lead the conclusions to be unreliable. Suzuki and Nei set out to determine the evolutionary relationships between Influenza A, B and C HA (HE) genes and the divergence rate between the genes. Methods To construct a phylogenetic tree, Suzuki and Nei used amino acid sequences of the HA2 (HE2) region of the HA protein. They collected sequences from the international DNA databank (DDBJ), excluding identical sequences from the same strain and laboratory generated virus sequences. They obtained 57, 34, 58, 10, 29, 2, 41, 1, 4, 2, 1, 1, 3, 1, and 2 amino acid sequences from subtypes H1-H15 of Influenza A HA2s, respectively. They also collected 15 sequences from B viruses and 35 sequences from C viruses. The sequences were all aligned in a program called CLUSTAL W. Alignment gaps were removed and 207 amino acid sites were estimated for p, Poisson Correction (PC) and gamma distances. The phylogenetic tree was constructed using the neighbor-joining (NJ) method. The NJ trees were also constructed from a random sampling of each of the H subtypes for Influenza A and a random sample from both B and C viruses (see Table 1). In addition, Suzuki and Nei estimated the divergence time of HA genes in the Influenza A virus using amino acid requences from the DDBJ. The used 50, 25, 24, 10, 21, 2, 25, 1, 4, 2, 1, 1, 3, 1, and 2 amino acid sequences for each of the subtypes H1-H15, respectively. They made a multiple alignment of the 172 sequences in CLUSTAL W and removed all alignment gaps before estimating the gamma distances (a=1.20). An NJ tree was constructed and branch lengths were determined using the ordinary least squares method to estimate the rate of amino acid substitution. Because years of isolation were available in the sequences, Suzuki and Nei estimated the rate of amino acid substitution with the regression coefficient of the numbers of amino acid substitutions from a common root on the years of isolation. Suzuki and Nei estimated the rate of amino acid substitution using the NJ tree in duck A viruses because they represented the largest sample size (28) of aquatic birds. To estimate the divergence times, they constructed a linearized tree with the 28 samples using the gamma distances when a=1.20. Standard errors and 99% confidence intervals were estimated using the bootstrap method. Results All trees constructured using each method p (Figure 1), PC and gamma distances (PC and gamma not shown here) show the same topology and show that the Influenza A HA2 gene diverged from the Influenza B HA2 gene. Each A virus HA2 subtype diverged after splitting from the B HA2 gene. Suzuki and Nei estimated the years of divergence between duck Influenza A HA amino acid sequence and human and swine Influenza A HA amino acid sequences. They found that the rates of divergence for human and swine amino acid sequences were easily estimated because the divergence rate was high. However, divergence among duck samples was low and uneasily estimated. In the H1 subtype seen in Figure 2A, the rate of human amino acid sequence substitution was estimated to be 1.20e-3 per site per year (Figure 3A) and the year of divergence at node M was 1862. For this analysis, Suzuki and Nei only used amino acid sequences from viruses prior to 1977 because those preceeding are known to be adapted from laboratory originating viruses. They also estimated the rate of divergence at the same node (M) using classical swine sequences and found a rate of 0.56e-3 per site per year and an estimated year of divergence of 1836 (Figure 3B). They averaged these years and estimated that the year of divergence at node M was 1849. They also estimated the divergence year at node N using avian-like swine sequences and found that these sequences evolved at a rate of 1.87e-3 per site per year and that the year of divergence was 1965 (Figure 3C). By adding the nodes M and N to the regressional analysis for the duck sequences, they were able to find that the duck sequences diverged at a rate of 3.89e-4 per site per year (Figure 3E). The rate of divergence of duck HA2 and the year of divergence at node O from human sequences was also estimated with a rate of 2.03e-3 per site per year and a year of divergence of 1946 (Figure 3D). They then added node O to the regressional analysis of duck sequences and found a rate of divergence of 2.48e-4 per site per year (Figure 3F). By taking the average of these numbers, they obtained a final evolution rate for duck Influenza HA genes equal to 3.19e-4 per site per year. Finally, Suzuki and Nei constructed a linearized phylogenetic tree to estimate when the HAs (HEs) diverged from each other (Figure 4 ). The earliest divergence (node X) likely occured 2,000 years ago. The latest divergence was at node T, where H7 HA subtype diverged from H15 HA subtype 379 years ago. Conclusion Suzuki and Nei found that the divergance of A and B influenza viruses HA occured before divergence of different A virus HA subtypes H1-H15. This was the opposite of what was originally thought, in which some type A HA subtypes diverged before B virus HA. They also discovered that duck amino acid substitution rate of HA in Influenza A virus was slower than human and swine A virus HAs. However, duck amino acid substitution in Influenza A HAs was similar to B virus HAs and C virus HEs. They hypothesized that in the natural reservoir mutation occurs more slowly because of a well adapted immune response. In the viral host, such as swine or human, the virus HAs (HEs) mutate at an accelerated rate due to variable immune response and functional constraints. There may be some error, however, because these viruses are in different hosts. In conclusion, Influenza virus A HA genes apparently evolve at a rate of amino acid substitution of 10^-4 per site per year in the natural reservoir. These genes diverged into A, B and C groups several thousands of years ago, whereas Influenza virus A subtypes (H1-H15) diverged several thousand to several hundred years ago. Influenza in the News On December 3rd, 2013, Hong Kong confirmed its first case of H7N9 bird flu in a human. This virus was originally found in poultry located in China, infecting nearly 100 people since it emerged earlier this year. The emergence of H7N9 in Hong Kong indicates that the virus is spreading, not only from bird to human, but country to country. In addition, the virus may be conferred human to human, because the workers caretakers are exhibiting flu like symptoms. A new statement released by WHO stated that a total of 139 human cases have been reported, including 45 deaths. See the article on BBC's website. References All information is from the primary article listed as reference 1 unless otherwise noted. #Suzukei and Nei. Origin and Evolution of Infleunza Virus Hemagglutinin Genes. Dec 1, 2001. http://mbe.oxfordjournals.org/content/19/4/501.full #Wikipedia article: Hemagglutinin (influenza) http://en.wikipedia.org/wiki/Hemagglutinin_(influenza) #Protein Data Bank 101: Hemagglutinin. http://www.rcsb.org/pdb/101/motm.do?momID=76